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This article describes a biomimetic control architecture affording an animat both action 
selection and navigation functionalities. It satisfies the survival constraint of an artificial 
metabolism and supports several complementary navigation strategies. It builds upon 
an action selection model based on the basal ganglia of the vertebrate brain, using two 
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interconnected cortico-basal ganglia-thalamo-cortical loops: a ventral one concerned with 
appetitive actions and a dorsal one dedicated to consummatory actions. 

The performances of the resulting model are evaluated in simulation. The experiments 
assess the prolonged survival permitted by the use of high level navigation strategies and 
the complementarity of navigation strategies in dynamic environments. The correctness 
of the behavioral choices in situations of antagonistic or synergetic internal states are 
also tested. Finally, the modelling choices are discussed with regard to their biomimetic 
plausibility, while the experimental results are estimated in terms of animat adaptivity. 

Keywords: action selection, navigation, basal ganglia, computational neuroscience 
Short title: Basal ganglia model of action selection and navigation. 



1 Introduction 

The work described in this paper contributes to the Psikharpax project, which aims at 
building the control architecture of a robot reproducing as accurately as possible the current 
knowledge of the rat's nervous system (Filliat et al, 2004), it thus concerns biomimetic 
modelling derived from data gathered with rats. The main purpose of the Psikharpax 
pr0J ec t is ,o refocas^the se^iaal objective advocated b y the anim at approach: building 
"a whole iguana" ((Dennett 



19781 ). instead of designing isolated and disembodied functions. 
Indeed, in the animat literature, a great deal of work is devoted to the design of isolated 
control architectures that provide either action selection or navigation abilities -two fun- 
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damental functions for an autonomous system. The main objective of robotic navigation 
architectures is to afford an animat with various orientation strategies, like dead-reckoning, 
taxon navigation, place-recognition or planning (Filliat and Meyer, 2003, Meyer and Fil- 
liat, 2003 for reviews). The main objective of action selection architectures is to maintain 
the animat in to its "viability zone", defined by the state space of its "essential variables" 



(Ashbv 



19521 1 . through efficient switches between various actions (Prescott et al., 1999 
for a review). Even if there is evidence that an effective animat requires the use of these 
two functionalities, few models attempt to integrate them, taking into account the specific 
characteristics of each. 

On the one hand, most of the navigation models insert arbitration mechanisms typical 
of action selection to solve spatial issues (e.g., Rosenblatt and Payton, 1989), but they do 
not take into account motivational constraints. 

On the other hand, action selection models always integrate navigation capacities en- 
suring an animat the ability to reach resources in the environment, but they typically 
implement only rudimentary navigation strategies -random walk and taxon navigation- 
(e.g., Maes, 1991, Seth, 1998). 

The few models that process both navigation and action selection issues are inspired by 
biological considerations, indicating that the hippocampal formation, in association with 
the prefrontal cortex, processes spatial information (O'Keefe and Nadel, 1978), whereas 



the basal ganglia are 



ivpothesized to 



the vertebrate brain ((Redgrave et al 



je a possible neural substrate for action selection in 



1999). 



For example, Arleo and Gerstner (2000) propose a model of the hippocampus that elab- 



orates an internal map with the creation of several "place cells", used by an animat to reach 
two different kinds of resources providing rewards. The outputs of the model are assumed 
to be four action cells, coding for displacements in cardinal directions, and assumed to 
belong to the nucleus accumbens. This nucleus, located in the ventral part of the basal 
ganglia, is hy pothesized to integrate sensorimotor, motivational and spatial information 



( Kellev 



1993 ). In this model, it selects the actual displacement by averaging the ensemble 
activity of the action cells. However, the animat does not select other navigation strategies 
and does not have a virtual metabolism that puts constraints on the timing and efficiency 
of the selection of its behaviors. 

Guazzelli et al. (1998) endow their simulated animat with two navigation strategies 
(place-recognition-triggered and taxon navigation, processed by hippocampus and pre- 
frontal cortex) and homeostatic motivational systems (hunger and thirst, processed by 
hypothalamus). Here, the role of the basal ganglia is limited to computing of reinforce- 
ment signals associated with motivational states, while action selection properly occurs in 
the premotor cortex. Yet, in this work, there are no virtual metabolism constraints on 
action selection and because of the choice of a systems-interaction level of modelling, the 
internal operation of the modules is not specifically biomimetic. 

Gaussier et al. (2000) endow a motivated robot (Koala™, K-Team) with a virtual 
metabolism -generating signals of hunger, thirst and fatigue- and a topological navigation 
capacity. A topological map is built in the hippocampus and used to build a graph of 
transitions between places in the prefrontal cortex, used for path planning. The motor 
output is assumed to be effected by action neurons in the nucleus accumbens, coding 



for three egocentric motions (turn right, left, go straight). Motivational needs affect path 
planning by spreading activation into the prefrontal graph from the desired resources to the 
current location of the animat. They are transmitted to the action neurons, allowing the 
animat to reach one goal by several alternative paths, and to make compromises between 
different needs. Here, one navigation strategy only is used, while various complementary 
strategies coexist in animals. 

These models do not entirely satisfiy the objectives of the fundamental functions, that 
is, dealing with survival constraints together with taking advantage of various complemen- 
tary navigational strategies. Moreover, they do not exploit recent neurobiological findings 
concerning neural circuits devoted to the integration of these functions, involving two paral- 
lel and interconnected "cortico-basal ganglia-thalamo-cortical" loops (CBGTC, Alexander 
et al., 1986), stacked on a dorsal to ventral axis, receiving sensorimotor (dorsal loop) and 
spatial (ventral loop) information. 

We previously tested a computational model of action selection, inspired by the dor- 
sal loop and designed by Gurne y et al. f2001a,b, refer r ed to here as 'GPR' after the 



authors'na mes), by rep 



vival task rtGirard et al 



i cating the 



Montes-Gouzalez et al 



( 20001 ) implementation in a sur- 



2003) . To improve the survival of an artificial system in a complex 



environment, our objective is to add to this architecture a second circuit -simulating the 
ventral loop- which selects locomotor actions according to various navigation strategies: a 
taxon strategy, directing the animat towards the closest resource perceived, a topological 
navigation, building a map of the different places in the environment and using it for path 



planning, together with random exploration, mandatory to map unknown areas and allow- 



ing the discovery of resources by chance. The interconnection of the dorsal and ventral 
loops is designed by means of bioinspired hypotheses. The whole model will be validated 
in several environments where the animat performs a simple survival task. 

After describing the navigation and action selection systems and how they are inter- 
connected, we will introduce the specific experimental setup (survival task and animat 
configuration). The results will concern tests on the animat's specific adaptive mecha- 
nisms and behaviors, involving topological and taxon navigation, opportunistic ability and 
conflict management in case of changes in the environment or internal state. 

2 The control architecture 

This model has been introduced in a brief preliminary form in Girard et al. (2004). 
2.1 Navigation 

The choice of the navigation model was based on functional and efficiency criteria: it had 
to provide the animat with the capabilities of building a cognitive map, localizing itself 
with respect to it, storing the location of resources and computing directions to reach these 
resources; these operations had to be performed in real time and had to be robust enough 
to cope with the physical limitations of a real robot. The navigation system proposed by 



Filliat §2001) was chosen as it provides the required features and has been validated on a 
real robot (Pioneer™, ActivMedia). 

This model emulates hippocampal and prefrontal cortex functions. It builds a dense 
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topological map in which nodes store the allothetic sensory input that the animat can per- 
ceive at respective places in the environment. These inputs are mean gray levels perceived 
by a panoramic camera in each of 36 surrounding directions, and sonar readings providing 
distances to obstacles in eight surrounding directions. A link between two nodes memorizes 
at which distance and in which direction the corresponding places are positioned relative to 
each other, as measured by the idiothetic sensors of odometry. The position of the animat 
is represented by a probability distribution over the nodes. 

The model also provides an estimation of disorientation (D), which varies from when 
the estimate of location is good, to 1 when it is poor. D increases when the robot is 
creating new nodes (it is in an unmapped area) and only decreases when it spends time 
in well known areas. The model also provides two 36-component vectors indicating which 
directions to follow in order to either explore unmapped areas (Expl) or go back to known 
areas in order to decrease disorientation (BKA). If the animat does not regularly go back 
to known areas when it is very disoriented, the resulting cognitive map will not be reliable. 
Consequently, the addition of topological navigation to an action selection mechanism will 
put a new constraint on the latter, the one of keeping Disorientation as low as possible. 

We provided the model with the ability to learn the localization of resources important 
to survival (e.g. loading station, dangerous area) in the topological map. It is learned by 
associating active nodes of the graph with the type of resources encountered using Hebbian 
learning. By specifying the type of resource currently needed to a path planning algorithm 
applied on the graph, a vector P of 36 values is produced, representing the proximity of 
that resource in 36 directions spaced by 10°. Such a vector can be produced for each type of 
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resource res, weighted by the motivation associated to that resource m(res), and combined 
with the other ones to produce a generic path planning vector Plan. The combination is 
processed as follows: 

Plan = 1 — — m(res) x P(res)) (1) 

res 

2.2 Action Selection System 

-Figure [2 around here- 

The action selection model presented here is an extension of the one used in Girard 
et al. (2003), the GPR model (Gurney et a/., 2001a). It is a neural network model 
built with leaky-integrator neurons, in which each nucleus in the BG is subdivided into 
distinct channels each modelled by one neuron (Figure HJ, and each channel associated to 
an elementary action. Each channel of a given nucleus projects to a specific channel in 
the target nucleus, thereby preserving the channel structure from the input to the output 
of the BG circuit. The subthalamic nucleus (STN) is an exception as its excitation seems 
to be diffuse. Inputs to the BG channels are Salience values, assumed to be computed in 
specific areas in the cortex, and representing the commitment to perform the associated 
action. They take into account internal and external perceptions, together with a positive 
feedback signal coming from the thalamo-cortical circuit, which introduces some persistence 
in the action performance. Two parallel selection and control circuits within the basal 
ganglia serve to modulate interactions between channels. Finally, the selection operates 
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via disinhibition (Chevalier and Deniau, 1990): at rest, the BG output nuclei are tonically 
active and keep their thalamic and motor system targets under constant inhibition. The 
output channel that is the less inhibited is selected, and the corresponding action executed. 

A principal original feature of our model is that two parallel CBGTC loops are modelled, 
one selecting consummatory actions and the other appetitive actions. 

2.2.1 Dorsal loop 

In the BG, the dorsal loop implicated in the selection of motor responses in reaction to 
sensorimotor inputs and corresponds to the one modelled in t he previous robotic studies 



of the GPR ( Montes-Gonzalez et al 



200( 



Girard et al 



2003). Here we hypothesize that 



it will direct the selection of non-locomotor actions, which in the present case are limited 
to consummatory actions (robotic equivalents of eating, resting, etc.) (Figure EJ). In this 
loop: 

• input Saliences are computed with internal and external sensory data; 

• at the output, a "winner-takes-all" selection occurs for the most disinhibited channel, 
as simultaneous partial execution of both reloading behaviors doesn't make sense. 

-Figure [2] around here- 



2.2.2 Ventral loop 



The ventral loop can be subdivided into two distinct subloops (IThierrv et all l2£)00) , orig 



inating from the core and shell regions of its input nucleus (nucleus accumbens or NAcc) 



(Zham and Brog, 1992). In the present work, we will only retain the core subloop (that will 
be henceforth also called ventral loop), which has been proposed to play a role in navigation 
towards rewarding places (Mulder et al, 2004; Martin and Ono, 20 00). The interactions 



between the hippocampus, the prefrontal cortex and the NAcc core ((Thierry et al 



2000) 



could be the substrate of a topological navigation strategy. Taxon navigation needs sensory 
information only and could therefore be implemented in the dorsal loop. However, it was 



repor ted that the lesion of the NAcc also impairs object approach (|Seamans and Phillip ; 



1994). This is why, in our model, this strategy will also be managed by the ventral loop. 

To summarize, we hypothesize that this loop will direct appetitive actions (robotics 
equivalent for looking for food, homing, etc.), suggesting displacements towards motivated 
goals (Figure 12]). 

The ventral loop is very similar -anatomically and physiologically- to th e circuits of the 



dorsa l loop: the dorsolateral ventral pallidum plays a role sim ilar to the GP (jMaurice et 



1997), the medial STN is dedicated to th e ventral circui ts 



well as the dorsomedial part of the SNr ([Maurice et al 



Par ent and Hazrati . 




1223) as 



al 



. Thus, despite probable 



differences concerning the influence of dopamine on ventral and dorsal input nuclei, it is 
also designed by a GPR model. However, a few differences are to be noted: 



Saliences are computed with internal and external sensory data: the taxon navigation 
needs distal sensory inputs to select a direction and all navigation strategies are 
modulated by the motivations. Additional data coming from the navigation system 
proposes motions on the basis of a topological navigation strategy and map updates 
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of current positions; 

• each nucleus is composed of 36 channels, representing allocentric displacement direc- 
tions separated by 10°; 

• the lateral inhibitions which occur in the nucleus accumbens core are no longer uni- 
form as in the dorsal loop, but increase with the angular distance between two 
channels (see eqn. EJ), so that close directions compete less than opposite ones; 

• at the output, the selection makes a compromise among all channels disinhibited 
above a fixed threshold. The direction chosen by the animat is computed by a vector 
sum of these channels, weighted by their magnitudes of disinhibition. 

2.2.3 Interconnection of Basal Ganglia loops 

Interconnections between the parallel CBGTC loops is needed to coordinate their respec- 
tive selection processes. This is especially true here, when selections concerning navigation 
taken in the ventral loop -like following a planned path leading to a resource- might be 
conflicting with behavioral choices made by the dorsal loop -like resting. Four main hy- 



potheses concerning interconnection s between loops 
Two of th em ( Hierarchical pathwa y (j.Ioel and Weiner 



pathway 



Joel and Weiner 



l ave b een proposed in the rat's brain. 



199- 



) and Dopaminergic hierarchical 



2000)) were discarded because they only allow unidirectional 
communication from ventral to dorsal loops, whereas bidirectional or dorsal-to- ventral com- 
munication was necessary to solve our conflicts. The two remaining possibilities are (1) the 
Cortico- cortical pathway: cortical interconnections between areas implied in different loops 
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could allow bidirectional flows of information between loops; and (2) the Trans- subthalamic 
pathway (Kolomiets et al., 2001, 2003): the segregation of loops is not perfectly preserved 
at the level of the STN, some neurons belonging to one loop are excited by cortical areas 
belonging to other loops, thus, parts of the SNr belonging to one loop can be excited by 
another loop (Figure EJ). 

We implemented the trans- subthalamic hypothesis, by distributing dorsal STN activa- 
tion to the ventral outputs (see eqn. [ljland Figure |2|). Selection of an action in the dorsal 
loop increases activity in the dorsal STN, which in turn increases activation of the ventral 
outputs, preventing any movement from occuring. 

The precise mathematical description of the resulting model is given in appendix IA.11 



3 Experimental setup 



3.1 Environment and survival task 



The e xperim ents are performed in simulated 2D environments involving, as in Girard 



et al. (2003), the presence of "ingesting" and "digesting" zones, but with the addition of 
"dangerous" places. The animat has to reach "ingesting" zones in order to acquire Potential 
Energy (Ep), which it should convert into Energy (E) in "digesting" zones, in order to use 
it for behavior. Note that a full load of Energy allows the animat to survive only 33mm. 
Paths to reach these zones may contain dangerous areas to avoid. 

The software used is a simulator programmed in C++, developed in our laboratory. 
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Walls and obstacles are made of segments colored on a 256 level grayscale. The effects of 
lighting conditions are not simulated: the visual sensors have a direct access to the color. 
The three type of resources are represented by 50cm x 50cm squares of specific colors: 
the "ingesting" (E p ), "digesting" (E) and "dangerous" (DA) areas are respectively gray 
(127), white (255) and dark gray (31). They can be used by the animat when the distance 
between their centre and the centre of the animat is less than 70cm (i.e. when they occupy 
more than 60° of the visual field). The other gray objects have no impact on survival but 
help the navigation system discriminating places. 

3.2 The animat 

The animat is circular (30cm diameter), and translation and rotation speeds are 40cm. s -1 
and 10°. s" 1 respectively. Its simulated sensors are: 

• an omnidirectional linear camera providing the color of the nearest segment for every 
10° surrounding sector, 

• eight sonars with a 5m range, a directional incertitude of ±5° and a ±10cm distance 
accuracy, 

• encoders measuring self-displacements with an error of ±5% of the measured distance, 

• a compass with a ±10° range of error of estimated direction. 

The sonars are used by a low level obstacle avoidance reflex which overrides any decision 
taken by the BG model when the animat comes too close to obstacles. The navigation 
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model uses the camera, encoders and compass inputs. The BG model uses the camera 
input to compute nine external variables: 

• Three 36-component vectors, Prox(DA), Prox(_Ep) and Prox(_E) providing the 
proximity of each type of resource in each direction. This measure is related to 
the angular size of the resource in the visual field with a 10° resolution, as it is 
obtained by counting the number of contiguous pixels of the resource color in a 7 
pixels window centered on the direction considered. These vectors are the basis of 
the taxon navigation strategy. 

• Three variables, mProx(DA), mProx(Ep) and mProx(E) which are the max values 
of the components of Prox vectors. 

• Three Boolean variables, A(DA), A(Ep) and A(E), which are true if the correspond- 
ing mProx value is one (i.e. if the resource is less than 70cm away and thus usable). 

These purely sensory inputs are completed by the vectors produced by the topological 
navigation system: the path planning vector Plan, the exploration vector Expl and the 
"go back to known areas" vector BKA. 

The animat has four internal variables: Energy and Potential Energy, which concern the 
survival task (see EH}, Fear, which is a constant, fixing the strength of the repellent effect 
of "dangerous areas" and Disorientation, which is provided by the topological navigation 
system (see I2.1|) . From these variables are derived four motivations used in saliences 
computations and in the weighting of the Plan vector (eqn. [TJ. The motivations to go 
back to known areas and to flee dangerous areas are respectively equal to the Disorientation 

14 



and Fear variables, while the motivation to reach Energy and Potential Energy resources 
are more complex: 



m{DA) = F 
m(BKA) = D 

(2) 

m{E) = (1 - E)y/1 - (1-Ep) 2 
m(Ep) = 1-E P 

The variables used to compute saliences in each loop are summarized in Figure El and 
the details of these computations are given in appendix IA.21 

4 Experiments 

Three different experiments are carried out in simple environments in order to test the 
adaptive mechanisms the animat is provided with. 

Experiment 1 tests the efficiency of the navigation/action selection models interface. 
An animat capable of topological navigation has to survive in an environment contain- 
ing one resource of Energy and one resource of Potential Energy which cannot be seen 
simultaneously. It is compared to an animat using the taxon strategy only, the use of the 
topological navigation is expected to improve the survival time. 

Experiment 2 tests adaptive action selection in a changing environment: on the one 
hand, the animat has to use a taxon strategy in order to reach newly appeared resources; 
on the other hand, it has to forget the location of exhausted resources to head towards 
abundant ones. 
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Experiment 3 tests adaptive action selection in case of antagonistic or synergetic in- 
ternal states: on the one hand, in a situation where two paths lead to a resource and the 
shortest one includes a dangerous area; on the other hand, in a situation where a short 
path leads to one resource only, while a longer one leads to two resources satisfying two 
different needs. 

In experiments 2 and 3, the animat is provided with a previously built map of the 
environment in order to allow statistical comparison of runs with identical initial conditions. 

4.1 Experiment 1: Efficiency of the navigation/action selection 
interface 

In this experiment, an animat traverses the environment (7m x 9m) depicted in Figure 03 it 
contains one resource of E and one resource of Ep, but it is impossible to see one resource 
from the vicinity of the other. In the first model configuration (condition A), the animat 
uses both object approach and topological navigation strategies, whereas in the other one 
(condition B), the animat uses object approach only. The "reactive" animat (condition B), 
following taxon strategy only, has to rely on random exploration to find hidden resources. 
In contrast, after a first phase of random exploration and map building, the animat in 
condition A should be able to reach desired resources using its topological map. 
-Figure El around here- 

Ten tests, with a four-hour duration limit, are run for both animats. Energy and 
Potential Energy are initially set to 1. The comparison of the median of survival durations 
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for both sets shows that in condition A, the animat is able to survive significantly longer 
(p < 0.01, U-test, see Table [TJ than the animat in condition B. 
-Table 1 around here- 



in (|Oirard et all 1200.1 ). action selection was only constrained by the virtual metabolism. 
Here, the addition of the topological navigation system generates a new constraint of limit- 
ing Disorientation. Yet it does not affect the efficiency of action selection, as the life span 



of animats is enhanced. 



4.2 Experiment 2: Changing environment 

-Figure 01 around here- 

This experiment takes place in the 6m x 6m environment depicted in Figure El where 
the second Potential Energy resource is not always present. 

4.2.1 New resources: Coordination of the navigation strategies 

In this case, the second Potential Energy resource is not present during the mapping phase, 
so that when the animat reaches the first intersection, it perceives a new resource that is 
unknown by the topological navigation system. The topological and the taxon strategies 
are thus competing, the first one suggesting to move to the distant resource (Epl) and 
the second to the newly appeared and closer resource (E P 2). For all tests, the animat is 
initially placed on the same location shown in Figure 0] and lacks Potential Energy (E = 1 
and E p = 0.5). The tests are stopped when the animat activates the ReloadEp action. 
The control experiment consisting of ten tests in which resource Ep2 is not added, 
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results in a repeatable behavior of the animat: it goes directly to Epl and activates the 
ReloadEp action when close enough to Epl. Three series of fifteen tests, with different 
weightings of the salience computations (variations of eqn.[T3in appendix using the weights 
of Table[2}, are compared by counting how many times the animat chose one resource versus 
the other. The results are summarized in Table El 
-Table 2 around here- 

The first weighting corresponds to the configuration used in the previous experiment 
(eqn. ITTjl. The path planning weight is larger than the taxon strategy one. As a result, the 
animat often ignores the new resource and chooses the memorized one. When the relative 
importance of the two strategies is modulated by progressively lowering the path planning 
weight, the behavior of the animat is modified and an opportunistic behavior, where it 
prefers the new and closest resource, can be obtained. 

Consequently, if our control architecture does not intrinsically exhibit an opportunistic 
or a pure planning behavior, it can easily be tuned to generate the desired balance between 
these two extremes. 

4.2.2 Exhausted resources: Forgetting mechanism 

In this situation, resource Ep2 is present during mapping but is removed during the tests. 
The animat then has to "forget" its existence in the map in order to go to the other resource. 

Fifteen tests are carried out, with the animat initially placed on the same start location 
(see FigureBJ lacking Potential Energy (E — 1 and E p = 0.5). The tests are stopped when 
the animat activates the ReloadEp action. 
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The animat first goes to the closest Ep resource coded by the topological navigation 
system: the near but absent E P 2 resource. The forgetting mechanism (implemented by the 
Hebbian rule used to link resources with locations on the map) allows the animat to finally 
leave this area and to reach resource Epl. The time necessary to forget Ep2 is estimated 
by subtracting the duration of the most direct path leading from the start position to E P 1 
via Ep2 (46s) to the duration of each test. The mean duration is 178s (cr = 78), i.e. 2 
minutes and 58 seconds (max value 5 minutes). It is a bit long (almost 10% of the 33 
minutes survival duration with a full charge of Energy), but it can be reduced by simply 
modifying the gain of the Hebbian rule. 

This shows that the ability to forget, which is necessary to survive in environments 
where resources are exhaustible, operates correctly. 

4.3 Experiment 3: Antagonistic or synergetic internal states 

4.3.1 Antagonistic internal states: Fear vs reloading need 

-Figure El around here- 

A first experiment is run in an environment (10m x 6m) containing two Ep resources 
and a dangerous area blocking direct access to the closest one (Figure EJ. The Dangerous 
Areas affect the planning algorithm of the topological navigation system in an inhibitory 
manner. A path planning vector leading to dangerous areas is computed, multiplied by the 
level of Fear and subtracted to the other planning vectors: the term — m(DA) x P(DA) 
is added to the computation of Plan described in eqn. [TJ 
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The animat initially lacks Potential Energy and its level of Fear is fixed (E = 1, 
E P < 1, F = 0.2). When the Dangerous Area is absent, the animat systematically chooses 
the closest resource (-Epl). However, when it is present, this inhibits the drive to go 
towards the Epl resource and the final choice of the Ep resource should thus depend on 
the importance of the lack of energy. 

-Table 3 around here- 

Two series of 20 tests are carried out in order to induce conflicts between internal 
states depending on Fear and Ep, respectively with a moderate (Ep = 0.5) and a strong 
(E P = 0.1) lack of Ep. As illustrated in Tabled the inhibition generated by the Dangerous 
Area in the first case is strong enough and the animat, despite the longer route, selects 
Ep2. In the second one, the need for Potential Energy is stronger and the animat, despite 
the danger, selects Epl. These two opposite tendencies are significantly different (Fischer's 
exact probability test, p < 0.01). 

This experiment shows that the animat may take risks in emergency situations and 
avoid them otherwise. But, more generally, it shows that it can exhibit, in an identical 
environmental configuration, different behavioral choices adapted to its conflicting internal 
needs, an essential property for a motivated animat. 

4.3.2 Synergetically interacting motivations 

-Figure El around here- 



This task is inspired by a T-maze experiment proposed in Quoy et al. (2002) in order 
to study the behavior generated by the coupling of two motivations. The left branch of the 
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T contains one Ep resource while the right one contains both an E and an Ep resource 
(Figure EJ) . The length of the right branch is varied so that the ratio of the right branch 
length to the left branch length is 1, 1.5 or 2. The animat is initially placed in the lower 
branch of the T, with a motivation for both E and Ep (E = 0.5 and Ep = 0.5). The test 
stops when the animat activates the ReloadEp action. In such a situation, the animat is 
expected to systematically prefer the right branch, even if it is longer, because choosing 
the left only satisfies the E P need, while choosing the right can satisfy both E and E P 
needs. 

-Table 4 around here- 

Three series of fifteen tests are carried out with branch length ratio values of 1, 1.5 
and 2, with an animat that needs both E and Ep. As long as the ratio is not too high, 
the cumulated activation generated by the two resources on the right is higher than the 
drive generated by the single Ep resource on the left (Table 0J ratio 1 and 1.5). However, 
when the two resources on the right are too far away, the drive they generate is attenuated 
by distance and the animat becomes more and more attracted by the resource on the left 
(Table IU ratio 2). 



The Gaussier et al. 120001) model of navigation integrates the notion of "preferred path" 
by reducing the apparent distance between two nodes of the map when they are often used. 
This allows the right branch to become preferred and thus systematically chosen over time. 
Future development of our model should include such a habit learning capability. 
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5 Discussion 

The proposed biomimetic model integrates both navigation and action selection, in taking 
into account the specificities of both survival constraint and variety of navigation strategies. 
Simulations in benchmark environments validate 1) the survival advantage of using path 
planning strategies, 2) the benefits of simultaneously using taxon and planning strategies 
along with the necessity of being able to forget when operating in changing environments, 
and 3) the capability of the model to behave adaptively in case of conflicting and synergetic 
motivations. 

5.1 From Rattus rattus... 

How the brain coordinates the interface between spatial maps, motivation, action selection 
and motor control systems is of timely interest. The rat brain is widely investigated in this 
purpose, but many issues remain to be clarified. By synthesizing observed mechanisms in 
a behaving artificial system, our work helps to formulate several questions. 

For example, our model points out limitations about the current neurobiological knowl- 
edge concerning the actual role of NAcc core channels: do they represent, as in our model 
and in e.g., Strosslin (2004), competing directions of movements? In Experiment 2.1, the 
level of opportunism is fixed and does not adapt to changing conditions (whereas taxon 
navigation is less reliable in poor lighting conditions), as the ventral loops selects one di- 
rection taking into account all the navigation strategies. This could be changed by having 
it selecting among the strategies the most adapted one before a dorsal loop selects the 
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direction of motion based on the chosen strateg y sugg estion only. Such coding has recently 



), on the basis of electrophysiological 



received support by the work of Mulder et al. (|20CK 
recordings in hippocampal output structures associated with the NAcc and a nucleus of the 
dorsal stream (ventromedial caudate nucleus). Another and more complex role may also 
be considered: NAcc core could interface goals, their location, their amount and the cor- 
responding motivations with inform ation coming from s e veral neural structures like other 
limbic structures or CBGTC loops ( Davan and BalleineL 12000 ). 



Likewise, our model questions the putative substrates of interactions between CBGTC 
loops and their mode of operation, a subject of active current research. We may have im- 



plemented the trans-subthalamic hypothesis in an exagge rated manner. In 



of STN projections from various loops is rather limited (IKolomiets et al 



act t he overlap 



2003), while in 



our model they extensively reach the whole output of the ventral loop. This choice was in- 
deed convenient for the role attributed here to the dorsal and ventral channels, respectively 
coding for immobile and mobile actions. Recent results rel ative to interaction s at the level 



of BG output projections to dopaminergic nuclei in rats (jMaillv et all [2003) shed a new 
light on the dopamine hierarchical pathway and could be the basis of an alternative model. 
In the GPR, varying the dopamine level affects directly the ability to select, therefore, the 
possibility that one loop may modulate the dopamine level of another one could be the 
basis of an alternative mechanism for a loop to shunt another loop. One cannot finally 
exclude the possibility that the resolution of selection conflicts in the CBGTC loops is not 
only managed in the BG but also in downstream brainstem structures, for example in the 
reticular formation (Humphries et al., this issue). 
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5.2 ...to Psikharpax 

In Experiment 1, the planning animat (condition A) sometimes dies because of a imperfect 
hand-tuning of the salience computations, which causes it to stop to reload too far away 
from resources. The basal ganglia, in interaction with the dopaminergic system, is supposed 
to be the neural substrate for reinforcement learning. In order to avoid such problems in 
the future, we are now add ing such a mechanism of automatic optimization of salience 



computations to our model (IKhamassi et all 120041 1 



As mentioned in introduction, this work contributes to the Psikharpax project, which 
aims at building an artificial rat (Filliat et ai, 2004). As it evolves, this artificial rat 
will be endowed with more than the few motivations taken into account here, in the aim 
to improve the actual autonomy of current robots, often devoted to a single task. The 
development of polyvalent artifacts working in natural environments is indeed promising 
for many applications in the home or in the office, as well as future space programs with 
unmanned missions. Our work also helps assessing the operational value of the biomimetic 
models used for this purpose. 
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A Appendix: Mathematical model description 



A.l GPR structure 



Activation (a) of every neuron of the model: 



da 

t— = 1 — a 
dt 



(3) 



where: I: input of the neuron, r: time constant (r = 25ms). Corresponding output 



(y): 



if a < e 

V={ mx(a-e) ife<a<e+l/m ( 4 ) 

1 ife+l/m<a 
Values of e and m for each nucleus in Table EJ 

-Table 5 around here- 
in each module (Dl and D2 striatum subparts, STN, EP/SNr, GP, VL, TRN and 
cortical feedback), the input of each channel i is defined by the equations El to [TJJ where 
N: number of channels, Si', salience of channel i, A: dopamine level (0.2). 



N 



I t m = (l + \)S i -Y,VDi 



3=0 



N 



P 



3=0 



(5) 



(6) 
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In our model of the ventral loop, lateral inhibitions (sum terms in eqn. ElandE]) increase 
with the angular difference between two channels. They are replaced in the ventral loop 
by the following LI term: 



3=0 1 
3+i- 



2 STN — &i Vgp (8) 

N 

Iep = -Vm ~ 0-4 yh P + 0-8 £ y j STN (9) 

3=0 

The trans- subthalamic pathway is modelled by a modified input for the ventral EP/SNr 
(v and d stand for ventral and dorsal): 



Iepv — ~ Voiv ~ 0-4 Vgpv 

N N (10) 

3=0 3=0 
N 

i l G p = -y l D2 + o.8j2y j sTN (ii) 

3=0 



N 



ivL = y l P-y l EP-^-^Y,yTRN (12) 



3=0 



i l TRN = yvL + y l p (is) 
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Ip = VVL 



(14) 



A. 2 Salience computations 



The modification to the GPR model proposed in Girard et al. (2003) consisted in allowing, 
for the computation of saliences, the use of sigma-pi neurons and non-linear transfert 
function applied to the inputs. This was kept in the present model and is the origin of the 
square roots and multiplications in the following equations. 

A. 2.1 Experiments 1 and 2 

Dorsal loop saliences (E and Ep reloading actions): 



S E = 0.4 x P E + 1.2 x A(E) x m(E) 

(15) 

+ 0.6 x mProx(E) x m(E) 



S Ep = 0.4 x P Ep + A(E P ) x m{E P ) 

(16) 

+ 0.2 x mProx(Ep) x m(E P ) 
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Ventral loop salience for each direction i: 



St = 0.2 x ^ + W pla WP\ani 
+ 0.55y / Prox(E) i x m(E) 
+ Ifi^ProxiEp), x m(Ep) 

+ 0.4 x BKAj x m(BKA) (17) 
+ Ex Pi x (0.25 

+ 0.05 x (1 - mProx(Ep)) x m(E P ) 

+ 0.05 x (1 - mProx(E)) x m{E)) 

Where W p i an and W t J xon are respectively set to 0.65 and 0.55, except in experiment 
4.2.1, where they take the values recorded in Table [2j 
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A. 2,2 Experiment 3.1 

Saliences of the dorsal loop computed as in experiments 1 and 2. Ventral saliences modified 
to include the avoidance of dangerous areas: 



Si = 0.2 x R t + 0.45-s/Plaiii 

+ 0.35y / Prox(E). x m(£) 

+ 0.35y / Prox(E P ) i x m(£ P ) 

+ 0.19 x (1 - Prox(DA).) x m(DA) 

(18) 

+ 0.4 x BKAj x m(BKA) 
+ Exp, x (0.05 

+ 0.05 x (1 - mProx(Ep)) x m(E P ) 

+ 0.05 x (1 - mProx(E)) x m{E)) 

A. 2. 3 Experiment 3.2 

Experiment 3.2 showed that the weight of the dorsal computations had to be lowered: 



Se — 0.4 x P E + 0.9 x A(E) x m{E) 

(19) 

+ 0.1 x mProx(E) x m(E) 



S Ep = 0.4 x P Ep + 0.9 x A(E P ) x m{E P ) 

(20) 

+ 0.1 x mProx(Ep) x m(E P ) 
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The ventral salience computations from experiments 1 and 2 risked stopping the animat 
too far from resources. As this problem arose systematically in experiment 3.2, the term 
(0.65v / Plai\) term was changed for (O.SSVPlarij x (1 — mProx(E)) x (1 — mProx(E P )). 
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Table 1: Comparison (U-test) of experiments testing median survival duration of animats 
in conditions A (taxon navigation only) and B (taxon and topological navigation). 



Durations (s) 


Median 


Range 


A 


14431.5 


2531 : 17274 


B 


4908.0 


2518 : 8831 


U test 


U = 15 


p < 0.01 
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Table 2: Resource choice depending on the relative weighting of the two navigation strate- 
gies in the salience computation. W v \ an and W ta p xon : weights related to planning and taxon 
navigation strategies respectively (see eqn. fT?]) . 



Wei] 


ghts 


Choices 


W i 

» T plan 


taxon 


Epl Ep2 


0.65 


0.55 


13 2 


0.55 


0.55 


7 8 


0.45 


0.55 


2 13 
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Table 3: Resource choice depending on the initial Ep level. 



Internal 


Incidence of 


state 


choices 


F Ep 


Epl 


E P 2 


0.2 0.1 


13 


7 


0.2 0.5 


2 


18 


Fisher's test 


P< 


0.01 
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Table 4: Branch choices depending on the length ratio. 

Incidence of 
first choice 
Ratio Left Right 

1 3 12 
1.5 4 11 

2 8 7 
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Table 5: Parameters of the transfer functions of the GPR model. 



GPR Module 


e 


m 


Dl Striatum 


0.2 


1 


D2 Striatum 


0.2 


1 


STN 


-0.25 


1 


GP 


-0.2 


1 


EP/SNr 


-0.2 


1 


Ctx 





1 


TRN 





0.5 


VL 


-0.8 


0.62 
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Figure Captions 

Figure HJ The GPR model. Nuclei are represented by boxes, each circle in these nuclei rep- 
resents an artificial leaky-integrator neuron. On this diagram, three channels are competing 
for selection, represented by the three neurons in each nucleus. The second channel is rep- 
resented by gray shading. For clarity, the projections from the second channel neurons only 
are represented, they are similar for the other channels. White arrowheads represent exci- 
tations and black arrowheads, inhibitions. Dl and D2: neurons of the striatum with two 
respective types of dopamine receptors; STN: subthalamic nucleus; GP: globus pallidus; 
EP/SNr: entopedoncular nucleus and substantia nigra pars reticulata; VL: ventrolateral 
thalamus; TRN: thalamic reticular nucleus. Dashed boxes represent the three subdivisions 
of the model proposed by its authors (Selection, Control of selection and thalamo-cortical 
feedback or TCF), note that these subdivisions appear on the simplified sketch of Figure El 

Figure [2j Final model structure. Input variables are exhaustively listed, 36-component 
vectors are in bold type. The excitatory projections from the STN of the dorsal loop to the 
EP/SNr of the ventral loop, which are the substrate for loops coordination, are highlighted. 

Figure EJ Experiment 1 environment. Initial position and orientation are represented 
by the schematic animat. E: Energy resource; Ep\ Potential Energy resource. 

Figure^} Experiment 2 environment. Initial position and orientation are represented by 
the schematic animat. Ep\ Potential Energy resource; Ep2 is absent in some experiments, 
see text. 

Figure EJ Experiment 3 environment. Initial position and orientation are represented 
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by the schematic animat. Epl,2: Potential Energy resources; DA: dangerous area. 

Figure EJ The three environments of experiment 4. The ratio of the right branch 
length to the left branch length varied between 1 and 2. Initial position and orientation 
is represented by the schematic animat. Epl,2: Potential Energy resources; E: Energy 
resource. 
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